Generic Analysis and Methods for Computing Skyline Variants

نویسندگان

  • Zhenjie Zhang
  • Hua Lu
  • Beng Chin Ooi
  • Anthony K. H. Tung
چکیده

Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, a point p is said to dominate another point q if, for all dimension, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all points not dominated by any other point. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. However, it is difficult to implement each of these variants separately in a system setting and instead effort must be made to provide a general framework so that these specific implementations can be easily materialized over the framework. Zhenjie Zhang Department of Computer Science School of Computing National University of Singapore E-mail: [email protected] Hua Lu Department of Computer Science Faculties of Engineering, Science, and Medicine Aalborg University E-mail: [email protected] Beng Chin Ooi Department of Computer Science School of Computing National University of Singapore E-mail: [email protected] Anthony K. H. Tung Department of Computer Science School of Computing National University of Singapore E-mail: [email protected] In this paper, a generalized framework is proposed for this purpose. Our framework explicitly and carefully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) the original advantages of skyline can be maintained while adaptivity to application semantics is also catered to and (2) computational complexity is almost unaffected. We prove that traditional dominance is the only relationship satisfying all desirable properties and present some new dominance relationships to illustrate that other skyline variants always have their tradeoff in relaxing some of the properties. We then developed generic algorithms that compute skyline variants subject to the constraints that certain properties are relaxed and illustrate the use of our framework in computing of skyline over datasets with missing values. Extensive experimental results are presented to evaluate the efficiency and effectiveness of our framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce

The skyline operator and its variants such as dynamic skyline and reverse skyline operators have attracted considerable attention recently due to their broad applications. However, computations of such operators are challenging today since there is an increasing trend of applications to deal with big data. For such data-intensive applications, the MapReduce framework has been widely used recent...

متن کامل

Quantitative analysis Skyline In urbanscapes As the edge of the city(With emphasis on new tools of urban landscape's Analyzes)

Skyline,is an important factor In explaining the locative characteristics and Qualitative  properties of urban scape, As an indicator of improved urban life. This study has used one of the Contemporary urban studies methods, Because of the Appropriate and functional Capabilities and Analyzes Its impact on the urbanscape. This analysis is done based on the definitions in the field of urbanscape ...

متن کامل

Finding Superior Skyline Points from Incomplete Data

The skyline query has proven to be an important tool in multi-criteria decision making and search space pruning. A skyline query returns the subset of points from a multidimensional dataset that are not dominated by any other point. Due to its wide applications, skyline query and its variants have been extensively studied in the past. However, skyline computation for incomplete domain, where po...

متن کامل

Parallelizing Skyline Queries for Scalable Distribution

Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by le...

متن کامل

SkyDist: Data Mining on Skyline Objects

The skyline operator is a well established database primitive which is traditionally applied in a way that only a single skyline is computed. In this paper we use multiple skylines themselves as objects for data exploration and data mining. We define a novel similarity measure for comparing different skylines, called SkyDist. SkyDist can be used for complex analysis tasks such as clustering, cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009